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Abstract 

Classical messages can be sent via a noisy quantum channel in various ways, 
corresponding to various choices of ensembles of signal states of the channel. 
Previous work by Holevo and by Schumacher and Westmoreland relates the 
capacity of the channel to the properties of the signal ensemble. Here we 
describe some properties characterizing the ensemble that maximizes the 
capcity, using the relative entropy "distance" between density operators to 
give the results a geometric flavor. 

1 Communication via quantum channels 

Suppose Alice wishes to send a (classical) message to Bob, using a quantum 
system as the communication channel. Alice prepares the system in the "sig- 
nal state" pk with probability p/,, so that the ensemble of states is described 
by an average density operator p = ^jpuPk- Bob makes a measurement of 

k 

a "decoding observable" on the system and uses the result to infer which 
signal state was prepared. The choice of system preparation (represented by 
the index k) and Bob's measurement outcome are the input and output of a 
classical communication channel. 
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Holevo |T[ proved (as Gordon || and Levitin || had previously conjec- 
tured) that the mutual information between the input and output of this 
channel, regardless of Bob's choice of decoding observable, can never be 
greater than x, where 

X = S{p) -J2PkS{p k ) (1) 

k 

where S(p) = — Tr p log p is the von Neumann entropy of the density operator 
P- 

More recently, it has been shown by Holevo |4]] and by Schumacher and 
Westmoreland || that the Holevo bound is asymptotically achievable. That 
is, if Alice uses many copies of the same channel, preparing long code words 
of signal states, and if Bob chooses an entangled decoding observable, Alice 
can convey to Bob up to x bits of information per use of the channel, with 
arbitrarily low probability of error. (This fact was first shown for pure state 
signals in |J.) 

Suppose the channel is a noisy one described by a superoperator £. Then 
if Alice prepares the input signal state pk, Bob will receive the output signal 
state £{pk)- It is the ensemble of output signal states that determines the 
capacity of the channel. Effectively, the superoperator £ restricts the set 
of signals that Alice can present to Bob for decoding. If E is the set of all 
density operators, then Alice's efforts can only produce output states in the 
set A = £{B). 

In this paper we will consider the problem of maximizing \ for ensembles 
of states drawn from a given set A of available states. This includes the 
problem of maximizing \ for the outputs of a noisy channel, if A is chosen 
to be the set of possible channel outputs. In this case, A will be a convex 
set; but we will not need the convexity of A for many of our results. 

2 Relative entropy 

If p and a are density operators, then the relative entropy of p with respect 
to a is defined to be 

V{p\\a) = Trplogp-Trplogo-. (2) 
Here are three important points about the relative entropy: 
• V (p\ \a) > 0, with equality if and only if p = a. 
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• Strictly speaking, V (p\\a) is defined only if supp p C supper (where 
"supp p" is the support of the operator p). If this is not the case, then 
we take V (p\\cr) = oo. For example, if p and a are distinct pure states, 
the relative entropy is always infinite. 

• The relative entropy is jointly convex in its arguments: 

T>(pipi +P2p?\\p\<j\ +P2P2) < PiT>(pi\\ax) +p 2 V (p 2 j|<T 2 ) (3) 

for pi,p2 > with pi +P2 — 1. From this fact it also follows that the 
relative entropy is convex in each of its arguments. 

The relative entropy plays a role in the asymptotic distinguishability of quan- 
tum states by measurement [7J, and has been used to develop measures of 
quantum entanglement || 

It is often convenient to think of the relative entropy V (p\\a) as a "di- 
rected distance" from a to p, even though it lacks some of the properties of 
a true metric. This view of the relative entropy will let us give a geometric 
interpretation to our results. 

Suppose as before we have an ensemble of signal states in the available 
set A, in which pk appears with probability pu- It is easy to verify that the 
Holevo bound x can be given in terms of the relative entropy: 

X = Y,PkD(pk\\p). (4) 
k 

That is, x is j us t the average of the relative entropy of the members of the 
signal ensemble with respect to the average signal state. 

3 The optimal signal ensemble 

To maximize the information capacity of the channel, Alice will want to 
choose a signal ensemble that maximizes x- We will denote the maximum of 
X for a given set A of available states by x* ■ Any ensemble of signal states 
that achieves this value of the Holevo bound will be called an optimal signal 
ensemble. 

If the set of available states A is a closed convex set, then we can always 
take an optimal ensemble to be composed of extreme points of A — that is, 
states which cannot be written as convex sums of other states in A. To 
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see this, suppose we have an ensemble of ^4-states with average state p, and 
further suppose that pk is a member of the ensemble that is not an extreme 
point. This means that there are states pko and pki in A such that 

Pk = qoPko + qiPki (5) 

for probabilities go and qi that sum to unity. By the convexity of the relative 
entropy, 

V ( Pk \\p) < q V ( Pk0 \\p) + q x V ( Pkl \\p) . (6) 

Since x is the average of the relative entropies, we will never make x smaller 
by replacing p k (with probability p k ) by p k0 and p k \ (with probabilities p k q 
and p k qi, respectively) in the ensemble. Thus, at least one optimal ensemble 
will be composed of extreme points of A. 

For noisy channels, this means that pure state inputs to the channel are 
optimal - that is, it never increases x to use mixed states as inputs. This 
fact was shown in ||. 

A second and very surprising fact was discovered by Fuchs [[J . The quan- 
tity x is a measure of the distinguishability of an ensemble of signal states. 
If we wish to maximize the distinguishability of the output signals of a noisy 
channel, we might imagine that we should always maximize the distinguisha- 
bility of the input signals — i.e., choose an orthogonal set of input states. But 
this intuition turns out to be false. 

Some insight can be gained by examining a specific counter-example. Our 
quantum system is a spin, and ||) and ||) represent eigenstates of S z . The 
spin is subject to "amplitude damping", so that an initial density operator 
p evolves into a density operator 

P ' = £(p)=A 1 pA\ + A 2 pAi (7) 

where A 1 = vT=A|T)(TI + and A 2 = v^HXtl, and < A < 1. The 

result of this operation is, for instance, to leave the state ||) unchanged but 
to cause |f) to decay to ||) with probability A. We choose A = 1/2. A 
diagram of this process in the Bloch sphere is found in Figure [TJ 

If we consider only orthogonal input signal ensembles, the maximum x 
is obtained for an equally weighted ensemble of |— >) and \<— ), for which 
X = 0.4567 bits. But a non-orthogonal ensemble of the states \(f> ) and \<f>i) 
can achieve 0.4717 bits, where the angle in Hilbert space between the two 
inputs is about 80°. 
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Why is this? Recall that \ is the average relative entropy "distance" 
from the average signal state to the individual signal states. This distance 
function grows larger near the boundary of the Bloch sphere-so that, for 
example, the relative entropy distance between distinct pure states is infinite. 
Thus, despite the appearance in Figure [l], the relative entropy distances for 
the ensemble of po an d P\ are greater than those for the ensemble of p^ and 

4 Changing the ensemble 

In this section we will prove some useful results that will enable us to further 
characterize the optimal ensembles for a given set A of available states. 

Suppose as before that the signal state pk G A appears in our ensemble 
with probability p k , yielding an average state p. Let a be some other density 
operator, which we will call the "alternate" state. Then we can calculate the 
average relative entropy distance of the signal states from a: 

J2Pk v iPk\W) = ^2Pk (Trp fe logp fc - Trp fc logcr) 

k k 

= ^Pk (Trp fc logp fe - Trp fe logp) 

k 

+ (Tr p log p — Tr p log a) 
= Y.PkV{ Pk \\p)+V{p\\a) 

k 

Y.PkV( Pk \\a) = x + £>(p|k)- (8) 

k 

This useful identity, first given by Donald[|Kj, has a number of implications. 
For example, 

• For any ensemble and any a, 

T,Pk v (Pk\W)>X (9) 

k 

with equality if and only if a = p. 

• From the previous point it follows that 

X = mm (^2p k V (p k \\a)j (10) 
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where the minimum is taken over all density operators a. 

Now we will use our identity to consider how the value of x would change 
if we were to modify our ensemble. In particular, we can introduce a new 
state po with probability rj, shrinking the other probabilities to maintain 
normalization. We may conveniently refer to our ensembles as the "original" 
and "modified" ensembles, as summarized in the following table: 

ensemble original modified 

signal states p k Pk,Po 

probabilities p k (l—r])p k ,r] 

average state p p' 

Holevo bound x x' 

where 

p > = (i- v ) p + r]po (11) 

X = Y.PkV{ Pk \\p) (12) 

k 

x ' = {l-rj)Y,PhD{p k \\p') + r)V(poWp'). (13) 

k 

We wish to find how the Holevo bound changes - that is, we wish to make 
an estimate of Ax = x' ~ X- 

Begin with the expression for x' and apply Equation |8|, choosing the 
original ensemble and letting the modified average state p' play the role of 
the alternate state. This yields 

x' = (i-v)(x + v(p\\p'))+vV(po\\p') 
= x + v(v(po\\p')-x) + (i-v)v(p\\p') 

A X = r ] (V(p \\p')-x) + (l-v)V(p\\p'). 

Therefore, 

A X >v(V(po\\p')-x)- (14) 
This gives us a lower bound for A^- 

To obtain an upper bound, we apply Equation [|to the modified ensemble, 
with the original average state p playing the role of the alternate state. 

x ' + v(p'\\ P ) = (i- v )^ Pk v(p k \\pYj+ v v(p \\p) 

= (^-v)x + V' D (Po\\p) 
X'-X = r ] (V(p \\p)-x)-V(p'\\p) 
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And so we obtain 

A X <v(V(p \\p)-x)- (15) 

In deriving this inequality, we obviously assume that supp po ^ supp p. But 
if this is not the case, then the inequality still holds in the sense that the 
right-hand side is infinite. 

It is easy to generalize these results to a situation in which we modify 
the ensemble by adding many states. Suppose the states p 0a are added with 
probabilities r]q a (where the g a 's form a probability distribution). Then the 
above results would become 

V {^qaV{p Qa \\p')-x^ < A X < V (j2qaV(p 0a \\p)-xj • (16) 

All of our subsequent results still hold in this more general situation, but 
to simplify the discussion we will phrase our arguments in terms of "single 
state" modifications of a given ensemble. 

Finally, consider states p and p, and let p' = (1 — r\)p + r]p . Then 
T> (po| |p') exists and is finite for < 77 < 1, and 

• If supp po ^ supp p, then V (p \\p') — > V (p \\p) as r\ — > 0. 

• Otherwise, V (p | |p') — > oo as r\ — ► 0. 

We see that Equations and |1| are fairly "tight" lower and upper bounds for 
A%, because (informally speaking) the two expressions approach one another 
as rj approaches zero. 



5 Properties of optimal ensembles 

For a given set A of available states (e.g., the outputs of a noisy channel), let 
P/c and pk be the members and probabilities of the ensemble of ^4-states for 
which x takes on its maximum value. Call this the "x-optimal ensemble", 
and let p* be the average state of this ensemble. Denote max% by x* ■ The 
X-optimal ensemble has a number of important properties. 

Existence. If the letter states are outputs of a noisy channel in a finite- 
dimensional Hilbert space, then a x-maximizing ensemble exists. 



Proof: The key result can found in [11]: Let A be a convex, compact 



subset A of density operators on a Hilbert space of finite dimension d, 
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and let p be in A. If the set of extremal elements of A is compact then 
for any p G A there exists an ensemble of states {pk} C A with p = 
J^PkPk that maximizes \ over the set of all ensembles whose average 
state is p. In other words, there exist optimal signal ensembles for a 
given average state p. By Caratheodory's Theorem, since the Hilbert 
space has d dimensions, then there are optimal ensembles (in this sense) 
with no more than d 2 states. 

We see that the conditions for the result from [O] are met. The set of 
states A that are possible outputs of the channel is a convex, compact 
set with a compact set of extremal points. For any average state p in 
A, we can find a p-fixed optimal ensemble with d 2 or fewer elements. 
Thus, in order to maximize x over an possible ensembles, we only need 
to consider the set of ensembles with no more than d 2 elements drawn 
from A. As this is a finite cartesian product of a compact set, it is 
compact. As x is a continuous function, it must achieve its maximum 
in this set of ensembles. Thus, the existence of an optimal ensemble of 
states in A is assured. 

Maximal distance property. For any state po in -A, 

-D( Po \\p*)<X*- (17) 

Proof. We assume the existence of a state p with V (p | |p*) > \* ■ (We 
allow for the possibility that V (p \\p*) is infinite.) Since T> (p \\p') — > 
V(po\\p*) as r] — > 0, we can find a value of r\ so that T> (p \\p') > x* ■ 
Then by Equation [14], 

A X >v(V(Pa\\p')-X*) >0. 

That is, we can increase \ by including p i n the signal ensemble, which 
is a contradiction. 

Maximal support property. For a x-optfmal ensemble, supp p* = supp^4. 
(By "supp A" we mean the smallest subspace that contains supp pk for 
any pk G ^4.) In other words, any ^-optimal ensemble "covers" the 
support of the set of available states. 

Proof. This is a corollary to the maximum distance property. If there 
were a state p G A so that supp p were not contained in supp p*, then 
T> (po||p*) would be infinite. 
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Sufficiency of maximal distance property. Suppose we have an ensem- 
ble with average state p and a particular value of Xi an d suppose that 

^(Po||p)<X 

for all po G A. Then this must be a x-optimal ensemble. That is, the 
only ensembles that have the maximal distance property are x-optimal 
ensembles. 

Proof: If we add a state po with probability rj to the ensemble, then 
from Equation 15 

A X <v(V(Po\\p)-x) <0 

so that we cannot increase x- (By Equation [16], the same would hold 
if we were to add several different states instead of only one.) Thus, 
X = X*- 

Equal distance property. Suppose pt is a member of a x-optimal ensem- 
ble with probability p k 7^ 0. Then 

V(p k \\p*) = X *- (18) 

In other words, all of the non-zero members of a x _ °pti ma l ensemble 
have the same relative entropy "distance" with respect to the average 
state p*. 

Proof: This is another corollary to the maximal distance property. 
If V (p k \\p*) < x* for any p k with p k 7^ 0, then the average relative 
entropy cannot equal x* ■ 

Min-max formula for x* • From the above properties, we can show the 
following formula: 



min ( maxD (p | \p) ) , (19) 



X p \ po 

where the maximum is taken over all p^ £ A and the minimum is taken 
over all average states p of ensembles of ^4-states. 

Proof: We first show that, for any state a, the quantity max 22 (poll* 7 ) 

PO 

is an upper bound for the value of x f° r an y possible ensemble. By 
Equation || we find that 

X < y2Pk V (pk\W) < maxD (poller) . 
z — ' Po 
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This will also hold for an optimal signal ensemble, for which x — X* ■ 
Thus, 

X* < min I max T> (po | |p) 



P \ Po 

Next we note that the maximal distance property implies that 

X* =max£>(p ||p*), 
from which we can see that 



X* > min ( maxD (pol |p) ) • 
p \ PO ' 1 



These two inequalities establish the formula in Equation [19 



These properties provide strong characterizations of an optimal signal 
ensemble for a quantum channel. Equation for example, shows that x* 
can be calculated as a purely "geometric" property of the set A, without 
direct reference to any ensemble. We believe that our results are likely to 
prove useful in further investigations of the efficient use of quantum resources 
to transmit classical messages. 
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Figure 1: Bloch sphere diagram for amplitude damping. The highest value 
of x f° r a se ^ °f orthogonal input signals is attained by an equally weighted 
mixture of |— >) and |<— ), but the non-orthogonal input signals \<f>o) and |0i) 
yield a larger value of x- 
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